TRUNAJOD: A text complexity library to enhance natural language processing
نویسندگان
چکیده
منابع مشابه
Web Text Corpus for Natural Language Processing
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpus by downloading web pages to create a topic-diverse collection of 10 billion words of English. We show that for context-sensitive spelling correction the Web Corpus results are better than using a search engine. For t...
متن کاملNatural Language Processing Complexity and Parallelism
This paper reviews the processes involved in Natural Language Processing (NLP). It then demonstrates the various kinds of choices that need be taken during the execution of the word morphology, the syntactic text analysis, or text generation components. It compares the time complexity of traditional serial algorithms and examines the possible expected gain in some corresponding parallel counter...
متن کاملNatural Language Processing: Structure and Complexity
We introduce a method for analyzing the complexity of natural language processing tasks, and for predicting the difficulty new NLP tasks. Our complexity measures are derived from the Kolmogorov complexity of a class of automata — meaning automata, whose purpose is to extract relevant pieces of information from sentences. Natural language semantics is defined only relative to the set of question...
متن کاملApplying Natural Language Processing to Text Analysis in Educational Context
In this paper, we use the term academic text to refer to any free text composed in an academic setting, covering the whole spectrum from first year students’ reviews of available scientific texts up to a scientist’s struggling, but still fragile text concerning his or her new discoveries. These academic texts can be processed by various information technologies. We can now outline the landscape...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Open Source Software
سال: 2021
ISSN: 2475-9066
DOI: 10.21105/joss.03153